-
Notifications
You must be signed in to change notification settings - Fork 48
DOCS-4086: code samples for building a good dataset #4413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCS-4086: code samples for building a good dataset #4413
Conversation
✅ Deploy Preview for viam-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my comments are probably unintentionally out-of-diff on content that was repurposed; apologies. Did not review code in any detail since I assume that's all tested. Also GitHub seems to be bugging so will end review now lest comments not actually show up. LMK if I should provide code review (to the extent that I'm qualified to do so :D )
Co-authored-by: Jessamy Taylor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some mostly minor feedback - overall great direction.
Only bigger feedback I have is that this does create quite a lot of new pages. I am not entirely convinced we need quite that many. Creating a dataset, for example, is fairly short, should that just be an include and part of some of the other pages? Will need to think more about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow is hard because there are so many ways to do each step, and some of them can happen in any order (e.g. add to dataset then annotate or vice versa?), not just one linear path. The true flow chart is a pile of strands of spaghetti that each fork off into multiple ends. So I guess this is a plausible way and I don't currently have a better suggestion for how to present the path(s).
Noticed a couple more things; commented. Generally not blocking except maybe get image vs get images discrepancy in code samples?
Co-authored-by: Jessamy Taylor <[email protected]>
Co-authored-by: Naomi Pentrel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the cards there because it goes against the flow that the "next"-buttons try to suggest. A sentence with links to the steps is less confusing I think because there's fewer boxes.
I have an alternate suggestion, why don't we do:
- Capture and annotate images
- Create a training dataset (which includes adding to a training dataset)
capture and annotate could be separate but I feel like that might make create a training set less awkward?
docs/data-ai/train/update-dataset.md
Outdated
{{% /tab %}} | ||
{{< /tabs >}} | ||
|
||
## Capture, annotate, and add images to a dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still very odd to me. Like it essentially does both capture + annotate (which is the next page) and adding to dataset when we've split those across three pages. Either they're together and we have them in one page or this doesn't make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this snippet was inspired by the wine-pouring demo example linked to me as a good pattern, so i wanted to find a place for it. if you feel it doesn't fit into the flow of the pages as-is, would you rather i:
- removed this example entirely?
- rearranged the pages, perhaps combining add and annotate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorganized according to your other comment; hopefully that helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep this lgtm. I think keeping is good thought maintenance will be painful so future us might disagree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noting to self that we can significantly simplify this code when uploadfiletodataset happens
|
||
## Classify images with tags | ||
|
||
Classification determines a descriptive tag or set of tags for an image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an image here might be great and more immediatly convey this but that doesn't need to necessarily happen with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll brainstorm some ideas for this and submit a request
|
||
{{< alert title="Tip" color="tip" >}} | ||
|
||
Unless you already have an ML model that can generate tags for your dataset, use the Web UI to annotate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing. Because it's sort of begs the question - so if I do have a model, what then? So it should link to that code. which maybe means that this should go to the annotate page: https://deploy-preview-4413--viam-docs.netlify.app/data-ai/train/update-dataset/#capture-annotate-and-add-images-to-a-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstand you, isn't this already on the annotate page? I put this admonition here to help guide people into the appropriate tab in the tabset that follows. Where are you thinking we could link?
Regardless I reworded this away from the question-begging 'unless', but me know if I'm missing something else.
Co-authored-by: Naomi Pentrel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but latest changes are also a bit confusing....suggested a possible product change 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now there is still capture and add to dataset content on two different pages which feels confusing.
It's hard to force a linear flow here since with all the different options for each step, the same order doesn't always apply:
- In most cases, like using the mobile app, uploading a batch, or adding existing images, you can get the data first and then create a dataset and add the data to it. Order doesn't matter, but it's easiest conceptually to think of getting data and then making a dataset with it.
- One exception: If you capture individual images through the UI and add them to a dataset on the spot, you have to have a dataset to add them to before you start capturing.
- In the script version of that same "Capture individual images" heading, it looks like you don't specify a dataset id, so you'd still have to add to a dataset later like with the other methods.
Suggestion:
- Implement Naomi's order suggestion, and:
- Get rid of the exception: Ask eng to change that capture button to not save to dataset but rather just save the image to your captured data. Just one single click, so you can capture more images in rapid succession, then add a batch all at once later. This would be a less clunky UX IMO, and also solve a docs flow problem.
- If this will take a while but they'll do it, don't worry about this flow for now; document the rest per Naomi's order
- If this can't/won't ever be changed in eng, document this as a thing you can do but shape the docs around the normal capture-then-make-a-dataset order
Co-authored-by: Jessamy Taylor <[email protected]>
Co-authored-by: Naomi Pentrel <[email protected]>
It looks like the following files may have been renamed. Please ensure you set all needed aliases: |
🔎💬 Inkeep AI search and chat service is syncing content for source 'Viam Docs' |
Most non-SDK content is repurposed from the existing 'create a dataset' page.
Apologies for the large line-changed count; hard to avoid when you're splitting up pages and creating examples across multiple languages.